13 research outputs found

    The Architecture of the XtreemOS Grid Checkpointing Service

    Get PDF
    The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

    Independent Checkpointing in a Heterogeneous Grid Environment

    Get PDF
    The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying grid-node checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform interface. In this paper, we present the integration of an independent checkpointing and rollback-recovery protocol into the XtreemGCP. The solution we propose is not checkpointer bound and thus can be transparently used on top of any grid-node checkpointer. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability.Le projet XtreemOS financĂ© par l'Union EuropĂ©enne met en oeuvre un systĂšme d'exploitation open-source pour grille basĂ© sur Linux. Afin d'offrir tolĂ©rance aux fautes et migration d'applications pour grilles, il intĂ©ragit avec un service distribuĂ© de sauvegarde de points de reprise de processus appelĂ© XtreemGCP. Ce service est conçu pour supporter diffĂ©rents protocoles de sauvegarde de points de reprise de processus et pour s'interfacer avec les systĂšmes de sauvegarde de points de reprise sous-jacents (par exemple BLCR, LinuxSSI, OpenVZ, etc.) de maniĂšre transparente Ă  travers une interface uniforme. Dans cet article, nous prĂ©sentons l'intĂ©gration d'un protocole indĂ©pendant de sauvegarde de points de reprise et de retour arriĂšre dans XtreemGCP. La solution que nous proposons n'est pas limitĂ©e par le systĂšme de sauvegarde de points de reprise et peut ainsi ĂȘtre utilisĂ©e de façon transparente au-dessus de n'importe lequel. Nous Ă©valuons ce prototype en l'exĂ©cutant dans un environnement hĂ©tĂ©rogĂšne composĂ© de simples noeuds PC et d'une grappe basĂ©e sur un systĂšme Ă  image unique (SSI). Les rĂ©sultats expĂ©rimentaux dĂ©montrent la capacitĂ© du service XtreemGCP Ă  intĂ©grer les diffĂ©rents protocoles de sauvegarde de points de reprise et Ă  sauvegarder de maniĂšre indĂ©pendante un point de reprise d'une application distribuĂ©e s'exĂ©cutant sur un environnement de grille hĂ©tĂ©rogĂšne. De plus, les Ă©valuations de performance montrent que notre solution surpasse les protocoles coordonnĂ©s existants en terme de passage Ă  l'Ă©chelle

    Massively Multiuser Virtual Environments using Object Based Sharing

    Get PDF
    Massively multiuser virtual environments (MMVEs) are becoming increasingly popular with millions of users. Commercial implementations typically rely on a traditional client/server architecture controlling the virtual world state of shared data at a central point. Message passing mechanisms are used to communicate state changes to the clients. For scalability reasons our approach creates and deploys MMVEs in a peer-to-peer (P2P) fashion. We use standard Java technology implementing only a few basic data-centric operations for the management of our distributed objects. Higher consistency models can easily be implemented using these basic operations. Currently, we have implemented transactional consistency offering convenient and consistent access to the shared scene graph. In this paper we describe our basic object model and the prototype implementation TGOS (Typed Grid Object Sharing). Furthermore, we discuss preliminary measurements with the virtual world Wissenheim executed on top of TGOS

    Distributed Architecture for a Peer-to-Peer-Based Virtual Microscope

    No full text
    Part 2: Work-in-Progress PapersInternational audienceVirtual microscopes are commonly used in medical education. They provide a platform for distributing whole slide images (WSI) with several GB size to exploring students. Even in courses with a few hundred students and dozens of WSI the network traffic may be high, but it will vastly increase, when the system is opened to access from the Internet. The same applies to user-generated content like interactive annotations (each student generates approx. 200 labels per term). In a collection that consists of several thousand WSI, which need to be annotated for training or quiz-based purposes, there will be millions of user contributions. In an abstract view users navigate through a universe of WSI and annotations and may meet other users watching the same or related WSI. This paper presents a distributed architecture build on PathFinder for Internet-based virtual microscopy addressing the challenges of distributing tightly connected data chunks on an overlay network consisting of random graphs

    Compiler Support for Reference Tracking in a Type-Safe DSM

    No full text
    The efficiency of language implementations is heavily influenced by the selected strategy for allocation and reclaim of memory. Memory allocation in a distributed shared memory (DSM) cluster poses additional challenges

    Checkpointing Process Groups in a Grid Environment

    No full text
    International audienceThe EU-funded XtreemOS project implements a grid operating system transparently exploiting resources of virtual organizations through the standard POSIX interface. Grid checkpointing and restart requires to save and restore jobs executing in a distributed heterogeneous grid environment. The latter may spawn millions of grid nodes ( PCs, clusters, and mobile devices ) using different system-specific checkpointers saving and restoring application and kernel data structures for processes executing on a grid node. In this paper we shortly describe the XtreemOS grid checkpointing architecture and how we bridge the gap between the abstract grid and the system-specific checkpointers. Then we discuss how we keep track of processes and how different process grouping techniques are managed to ensure that all processes of a job and any further dependent ones can be checkpointed and restarted. Finally, we present how Linux control groups can be used to address resource isolation issues during the restart

    Independent Checkpointing in a Heterogeneous Grid Environment

    Get PDF
    The EU-funded XtreemOS project implements an open-source grid operating system based on Linux. In order to provide fault tolerance and migration for grid applications, it integrates a distributed grid-checkpointing service called XtreemGCP. This service is designed to support different checkpointing protocols and to address the underlying gridnode checkpointers (e.g. BLCR, LinuxSSI, OpenVZ, etc.) in a transparent manner through a uniform interface. In this paper, we present the integration of an independent checkpointing and rollback-recovery protocol into the XtreemGCP. The solution we propose is not checkpointer bound and thus can be transparently used on top of any grid-node checkpointer. To evaluate the prototype we run it within a heterogeneous environment composed of single-PC nodes and a Single System Image (SSI) cluster. The experimental results demonstrate the capability of the XtreemGCP service to integrate different checkpointing protocols and independently checkpoint a distributed application within a heterogeneous grid environment. Moreover, the performance evaluation also shows that our solution outperforms the existing coordinated checkpointing protocol in terms of scalability
    corecore